Mixed-precision parallel linear solver#7048
Conversation
SoilRos
left a comment
There was a problem hiding this comment.
Overall, it looks good, I just left some minor concerns. I still need to make the performance tests though.
| * @param A Pointer to bsr matrix. | ||
| * @param x Pointer to input vector. | ||
| * @param y Pointer to output vector. | ||
| */ |
There was a problem hiding this comment.
If I understand the implementation correctly, the blocks must be already transposed. If that is the case, that should be written in this part of the documentation. I think the same holds for the other matrix vector product.
|
|
||
| vA[0] += _mm256_cvtps_pd(_mm_loadu_ps(AA+0))*_mm256_permute4x64_pd(vx,0b00000000); //0b01010101 | ||
| vA[1] += _mm256_cvtps_pd(_mm_loadu_ps(AA+3))*_mm256_permute4x64_pd(vx,0b01010101); //0b01010101 | ||
| vA[2] += _mm256_cvtps_pd(_mm_loadu_ps(AA+6))*_mm256_permute4x64_pd(vx,0b10101010); //0b01010101 |
There was a problem hiding this comment.
What are those comments for? Maybe just comment above those that this is the transposed block matrix vector product.
| { | ||
| cols[icol++] = row->getindexptr()[i]; | ||
| } | ||
| rows[irow+1] = rows[irow]+row->getsize(); |
There was a problem hiding this comment.
For the new matrix, it will better to do this loop with the iterators and their index() method instad of the internal member functions of the row:
for(auto col = row.begin(); col != row.end(); ++col)
{
cols[icol++] = col.index();
}
rows[irow+1] = icol;| //! @tparam Vector the block-vector used by linear operator | ||
| //! @tparam b block size | ||
| template <class Vector, int b> | ||
| class MatrixWrapper |
There was a problem hiding this comment.
Could you add this to a namespace?
Also, from the point of view of the user of this class, it is irrelevant that it wraps the c implementation or not. What is important is that its storage of 3x3 blocks is downcasted to single precision and vectorized. So I would call it something else, like MixedMatrixWrapper for example.
This PR leverages OPM's ISTL framework to provide a parallel implementation of the mixed-precision linear solver. For more information refer to
opm/simulators/linalg/mixed/README.md